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(57) The present invention provides a system and 
method for creating virtualized storage in a storage area 
network using distributed tabte-driven input/output map- 
ping. The present invention distributes the virtualization 
mapping in multiple parallel, mapping agents that are 
separate from a controller. This allows the performance- 
sensitive mapping process to be parallelized and dis- 
tributed optimally for performance, while the control of 
the mapping may be located in a controller chosen for 
optimal cost, management, and other implementation 
practicalities. The mapping agents store the virtual map- 
ping tables in volatile memory, substantially reducing 



the cost and complexity of implementing the mapping 
agents. The controller is responsible for persistent stor- 
age of mapping tables, thereby consolidating the costs 
and management for persistent mapping table storage 
in a single component. Distributed virtualization also al- 
lows the controller to manage multiple virtual disks used 
by multiple host systems, and allows a single virtual disk 
to be shared by multiple host systems. The mapping 
agents preferably do not interact only with other map- 
ping agents, thereby improving the scalability of the vir- 
tual storage system and the virtual storage system's tol- 
erance of component failures. 
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Description 

Related Applications 

[0001] This application claims priority from U.S. Pro- 
visional Application Nos. 60/209,109 and 60/209,326, 
filed on June 2, 2000, the disclosures of which are here- 
by incorporated by reference in full. 

Field Of The Invention 

[0002] The present invention is a virtualized data stor- 
age system using distributed tables with input/output 
mappings. 

Background Of The Invention 

[0003] A stand alone computer generally connects to 
data storage devices, such as hard disk, floppy disk, 
tape, and optical drives, via a fixed communication 
channel or bus, as schematically illustrated in FIG. 1A. 
While the communication channel allows high-speed 
data transfers, access to the storage device is limited to 
the stand-alone computer. 

[0004] Over time, it has become necessary for multi- 
ple devices to connect to a storage device so that mul- 
tiple users may share data. As a result, developers cre- 
ated a storage area network (SAN) consisting of multi- 
ple, interconnected, proximately located devices, as 
schematically illustrated in FIG. 1B. The SAN typically 
includes one or more network access servers that ad- 
minister the interaction of the devices and the operation 
of the network, including data storage devices that are 
accessible by the other devices in the SAN . The devices 
may be connected through Small Computer Systems In- 
terface (SCSI) buses to establish parallel communica- 
tion channels between the devices. In SCSI systems, a 
unique Logical Unit Number (LUN) is used to designate 
data storage locations, where each location is a sepa- 
rate storage device or partition of a storage device. Each 
LUN is further divided into blocks of small, easily man- 
ageable data sizes. By combining LUN zoning with port 
zoning to implement storage sharing, the SAN can have 
centralized, distributed data storage resources. This 
sharing of data storage resources across the SAN sub- 
stantially reduces overall data and storage manage- 
ment expenses, because the cost of the storage devices 
may be amortized across multiple devices. The use of 
centralized, distributed data storage also provides val- 
uable security features because the SAN may limit the 
ability of a device to access data in a particular zone. 
The performance costs of using consolidated data stor- 
age configurations within the SAN are substantially re- 
duced through the use of Fibre Channel connections be- 
tween the LUNs and the other network devices to 
achieve high-speed data input and output (I/O) opera- 
tions. The SAN operates, in effect, as an extended and 
shared storage bus between the host and the storage 



containers to offer, among other things, improved stor- 
age management, scalability, flexibility, availability, ac- 
cess, movement, and backup. The centralization of data 
storage, however, presents new problems, including is- 

5 sues of data sharing, storage sharing, performance op- 
timization, storage on demand, and data protection. 
[0005] Because of these issues, developers have re- 
cently added a virtualization layer to the SAN hierarchy. 
The virtualization layer refers to software and hardware 

10 components that divide the available storage spaces in- 
to virtual disks or volumes without regard to the physical 
layer or topology of the actual storage devices. Typically, 
virtual volumes are presented to the server operating 
system as an abstraction of the physical disk and are 

15 used by the server as if virtual volumes were physical 
disks. The virtual volumes are not LUNs on a storage 
array. Instead, the virtual volumes may be created, ex- 
panded, deleted, moved, and selectively presented, in- 
dependent of the storage subsystem. Each has different 

20 characteristics, and therefore expanded as the available 
storage expands. The SAN virtualization presents a sin- 
gle pool of SAN resources and a standard set of SAN 
services to applications residing on a broad range of op- 
erating platforms. 

25 [0006] However, SANs using conventional disks and 
storage subsystems incur substantial system and stor- 
age management expenses due to the tight coupling be- 
tween the computer system and the storage. Because 
of these and other reasons, the existing SAN technolo- 

30 gjes also have limited scalability. Furthermore, a key re- 
maining issue for SAN virtualization is the distribution of 
storage resources among the various devices of the 
SAN. 

[0007] Accordingly, there exists a need for an im- 
35 proved data storage system that addresses these and 
other needs in the SAN. One proposed class of storage 
system uses a subsystem to further improve the per- 
formance of the SAN by separating control and access 
functions from other storage functions. In such a class, 
40 access functions govern the ability to use and manipu- 
late the data on the SAN, and control functions relate to 
the administration of the SAN such as device monitor- 
ing, data protection, and storage capacity utilization. 
Separating control and access functions from other stor- 
es age functions pulls the virtualization function out of the 
server and onto the SAN. In addition to the virtualization 
of the storage provided by traditional, server bound im- 
plementations, the virtualization layer on the SAN ena- 
bles the automation of important data movement func- 
so tions, including the copying, movement, and storage of 
data through the creation and expansion of virtual vol- 
umes. 

[0008] Toward this purpose of separating control and 
access functions from other storage functions, currently 
55 proposed virtualized storage systems consolidate con- 
trol and mapping functions in a centralized location such 
as in the host, in a storage controller, or in a special vir- 
tualization component in the SAN, as illustrated in FIGS. 
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2A-2C respectively. Centralizing the control and map- 
ping functions avoids problems associated with distrib- 
uted mapping. However, storage virtualization schemes 
that are centralized in one component suffer from vari- 
ous scaling limitations, including the inabilities of scaling 
to multiple computer systems, multiple storage systems, 
and large storage networks with adequate performance. 
[0009] Improved scalability may be achieved through 
a distributed virtualized storage system. However, at- 
tempts to form distributed virtualized storage systems 
through the use of known technologies, such as array 
controllers, for distributing the mapping used in the vir- 
tual storage use simple algorithmic distribution mecha- 
nisms that limit data management flexibility, e.g. Redun- 
dant Array of Independent Disk (RAID). Furthermore, 
the known technologies do not address the needs of a 
scaleable virtual storage system, including issues of 
storage sharing, data sharing, performance optimiza- 
tion, storage system delays, and data loss risks. 

SUMMARY OF THE INVENTION 

[0010] In response to these needs, the present inven- 
tion provides a system and method for creating virtual- 
ized storage in a storage area network using distributed 
table-driven input/output mapping. The present inven- 
tion distributes the virtualization mapping in multiple par- 
allel mapping agents that are separate from a controller. 
This configuration allows the performance-sensitive 
mapping process to be parallelized and distributed op- 
timally for performance, while the control of the mapping 
can be located in the controller chosen for optimal cost, 
management, and other implementation practicalities. 
The mapping agents store the virtual mapping tables in 
volatile memory, substantially reducing the cost and 
complexity of implementing the mapping agents. The 
controller is responsible for persistent storage of map- 
ping tables, thereby consolidating the costs and man- 
agement for persistent mapping table storage in a single 
component. Distributed virtualization also allows the 
controller to manage multiple virtual disks used by mul- 
tiple host systems, and even allows a single virtual disk 
to be shared by multiple host systems. The mapping 
agents preferably do not interact with other mapping 
agents, thereby improving the scalability of the virtual 
storage system and the virtual storage system's toler- 
ance of component failures. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[001 1 ] These and other advantages of the present in- 
vention are more fully described in the following draw- 
ings and accompanying text in which like reference 
numbers represent corresponding parts throughout: 

FIGS. 1A-1B [PRIOR ART] are known systems for 
connecting a host to a storage device; 



FIGS. 2A-2C [PRIOR ART] are known virtualized 
storage area networks; 

FIGS. 3A-3B are schematic illustrations of a distrib- 
5 uted virtual storage area network in accordance 

with embodiments of the present invention; and 

FIGS. 4A-4B are schematic illustrations of a map- 
ping table for use in the distributed virtual storage 
10 area network of FIG. 3 in accordance with an em- 
bodiment of the present invention. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT 

15 

[0012] As illustrated in FIGS. 3A-3B and 4A-4B, the 
present invention provides a virtualized storage area 
network (SAN) system 100 using one or more distribut- 
ed mapping tables 200, as needed to form one or more 

20 virtual disks 150 for input/output (I/O) operations be- 
tween the hosts 140 and storage containers 160. In par- 
ticular, the table 200 contains a mapping that relates a 
position in the virtual disk 150 with the actual location 
on the storage containers 1 60. The specific contents of 

25 the table 200 are described in greater detail below. 
[0013] The present invention covers an improved 
storage area network (SAN). The invention can there- 
fore be applied to any known storage network 1 30. With- 
in the SAN, it should be appreciated that the storage 

30 containers 1 60 are known and may refer to any type of 
present or future known programmable digital storage 
medium, including but not limited to disk and tape drives, 
writeable optical drives, etc. Similarly, the hosts 1 40 may 
be any devices, such as a computer, printer, etc. that 

35 connect to a network to access data from a storage con- 
tainer 160. 

[0014] Likewise, the storage network 130 is also in- 
tended to include any communication technology, either 
currently known or to be developed in the future, such 

40 as the various implementations of Small Computer Sys- 
tems Interface (SCSI) or Fibre Channel. This distributed 
virtualization is most useful in environments where a 
large amount of storage is available and connected us- 
ing some sort of "storage network" infrastructure. In one 

45 preferred implementation, the storage network 130 is 
based on Switched Fibre-Channel connected storage. 
However, nothing in the design of the system 100 pre- 
cludes its use on other types of storage networks 130, 
including storage networks that are not yet invented. 

so [001 5] The hosts 1 40 issues I/O operation commands 
to the virtual disks 150, and in response, mapping 
agents 110 access the table 200. Although the agents 
110 are generally associated with the hosts 140, in this 
way, the agents 110 isolate the table 200 from general 

55 host 140 access. Preferably, each of the hosts 140 has 
a separate agent 1 1 0, so that each host has a separate 
mapping table 200. Alternatively, the system 100 could 
be configured so that more than one host 140 connects 
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to an agent 110. If multiple hosts 140 connect to the 
same agent 1 1 0, the hosts 1 40 share access to the par- 
ticular table 200. 

[001 6] The agent 1 1 0 stores the mapping table 200 in 
a volatile memory 111, typically DRAM. As a result, if 
one of the agents 110 shuts down or loses power, that 
agent 110 loses its copy of the table 200. For instance 
if the mapping agent 110 is embedded in the host sys- 
tem 140 and takes its power from that host system, as 
would be the case for a backplane card that serves as 
the mapping agent, the host 140 may cause the map- 
ping agent 1 1 0 to shut down by eliminating power to the 
agent 110. However by storing the table 200 in voi^*": 
memory, me „eiLIc can ot rapidly ac- 

cessed and modified on the agents 110. Storing the 
mapping table 200 in volatile memory has the further 
advantage of substantially reducing the cost and com- 
plexity of implementing the agents 1 1 0 as mapping com- 
ponents. Overall, the agents 1 1 0 allow the performance- 
sensitive mapping process to be parallelized and dis- 
tributed optimally for performance. 
[001 7] The system 1 00 further comprises a controller 
1 20 that, although separate from the agents 110, admin- 
isters and distributes the mapping table 200 to the 
agents 110. Control of the mapping is centralized in the 
controller 120 for optimal cost, management, and other 
implementation practicalities. The controller 120 further 
stores the table 200 in a semi-permanent memory 1 21 
so that the controller 1 20 retains the table 200 even after 
a power loss. The semi-permanent memory 121 is pref- 
erably a magnetic disk because of its high storage ca- 
pacity and fast, frequently writing capabilities. The con- 
troller 120 may alternatively store the table 200 using 
other forms of programmable storage such as writeable 
optical media and electronically programmable memo- 
ries. The controller 1 20 thus continues to store the table 
200 even if the controller 1 20 shuts down or loses power. 
[0018] In this way, the responsibility for persistent 
storage of the mapping tables 200 lies in the controller 
1 20, consolidating both costs and complexity. The exact 
design of the controller 120 is not a subject of this dis- 
closure. Instead, this disclosure focuses on the structure 
of the overall system and the interfaces between the 
mapping agent 1 1 0 and the controller 1 20. Accordingly, 
it should be appreciated that any controller, as known in 
the art of digital information storage, may be employed 
as needed to implement the present invention. Within 
this framework, each of the agents 110 preferably inter- 
acts with only the controller 120 and not with the other 
agents 110. As a result, the system 100 is highly scale- 
able and tolerant of component failures. 
[0019] As described below, the interactions of the 
controller 120 and the agents 110 are defined in terms 
of functions and return values. In one embodiment of 
the virtual mapping system 100 illustrated in FIG. 3A f 
this communication is implemented with messages on 
some sort of network transport, such as a communica- 
tion channel 132. In another implementation of the sys- 



tem 100 depicted in FIG. 3B, the communication chan- 
nel 132 is the storage network 130 itself. Any suitable 
technique may be used to translate commands, faults, 
and responses to network messages. The communtca- 

5 tion channel 130 may employ any type of known data 
transfer protocol such as TCP/IP. The particular interac- 
tions between the functions and activities of the control- 
ler 120 are described in greater detail below. 
[0020] FIGS. 4A-4B schematically illustrate the con- 

10 tents of the table 200. As described above, the table 200 
contains entri*^ 21 0 -r.at include a mapping be- 

tween "-is or more virtual disk segments 220 and stor- 
age locations 230 on the storage containers 160. The 
storage locations 230 identify the particular storage con- 

15 tainer 1 60 and part of the storage container 1 60 that cor- 
responds to the virtual disk index 220. The form for the 
storage locations 230 must be appropriate for the stor- 
age network being used. In a SCSI network, each of the 
storage locations 230 includes a LUN identifier 233 and 

20 a block identifier 235, also called an offset. All of the 
other fields in a mapping table entry 210 are simple in- 
tegers or binary state values. 

[0021] As depicted in FIG. 4A, the mapping table 200 
may have one entry 210 per each "disk block" of the 

25 virtual disk 220. While possible to build, this would result 
in huge mapping tables and highly fragmented mapping, 
both of which introduce undesirable performance deg- 
radations. In another embodiment, each mapping table 
entry 210 represents a variable sized group of contigu- 

30 ous virtual disk blocks that map to contiguous blocks on 
one of the physical storage containers 1 60. This config- 
uration of the table 200 offers greater mapping flexibility 
and dense mapping structures, but introduces algorith- 
mic complexity in managing the variable sized extents 

35 and greater map entry lookup costs. 

[0022] in response, the table 200 of a preferred em- 
bodiment uses mapping table entries 210, each having 
a fixed size number of contiguous blocks ("segments") 
on the virtual disk 1 50 that map to one storage container 

40 160, as depicted in FIG. 4B. In this embodiment, each 
of the entries 210 contains a virtual disk segment 220 
instead of a virtual disk block. The block identifier 235 
likewise identifies a corresponding segment of actual 
storage blocks. While FIG. 4B illustrates the table 200 

45 identifying an entire range of values for the mapping ta- 
ble entry 220 and the block identifier 235, the table 200 
could likewise identify only the beginning or end block 
where the size of the actual and virtual storage seg- 
ments is otherwise defined. While this configuration of 

so FIG 4B for the table 200 is possibly not as dense as var- 
iable sized extent mapping, the configuration offers the 
simplest and highest performance map access and 
space management. Regardless of the specifics of the 
table 200, the table 200 must map a virtual disk segment 

55 220 to each physical storage block involved in I/O oper- 
ations. 

[0023] In a preferred embodiment, the system 100 
has multiple tables 200, each having different mappings 
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between a virtual disk 150 and the storage containers 
160. In this way, different hosts 140 may have different 
access to the same storage container 160. Where the 
table 200 does not include one of the storage locations 
230, hosts 140 using this table (i.e., the hosts 140 con- 5 
necting to the agent 110 that stores this table) cannot 
access information stored at the storage location. In 
fact, the host 140 will not realize that this storage loca- 
tion 230 exists. 

[0024] During operation, the host 140 issues an I/O 10 
operation (e.g., read or write) to some block or blocks 
on a virtual disk 150. Each virtual memory block is rep- 
resented in the mapping table 200, either as an individ- 
ual entry or as part of a virtual disk segment 220. Each 
block contained in the I/O operation is mapped to the 15 
appropriate location on the storage container 160. The 
mapping agent 110 issues a corresponding I/O opera- 
tion issued to the storage container 160. The I/O oper- 
ation results are then collected and presented as a com- 
pleted operation on the virtual disk 150. 
[0025] In addition to mapping information specifying 
the storage location, each mapping table entry 21 0 also 
contains several states. The states are Boolean varia- 
bles that provide information on the current status of the 
virtual disk segment. These states are important be- 
cause they allow the table 200 stored in the agent 110 
to be remotely loaded and manipulated from the con- 
troller 120. These states and interfaces provide the abil- 
ity for the mapping tables to be distributed and for the 
mapping table entries to be volatile. The disclosure first 
describes the states and, then, explains some of the 
functions for the states. The table 200 includes at least 
an invalid state 240 indicating whether any I/O opera- 
tions may occur on the virtual disk segment 220 and its 
corresponding physical location 230. The invalid state 
may be activated during a first I/O operation to prevent 
further I/O operations until completion of the first I/O op- 
eration . In a preferred embodiment, the table 200 further 
includes a no-write (Nw) state 250 that indicates wheth- 
er the data contained at the corresponding physical lo- 
cation 230 may currently be changed. The Nw state 250 
allows for improved storage system performance be- 
cause it permits data to be read during another I/O op- 
eration. The invalid state 240 and the Nw state 250 func- 
tion during dynamic loading of mapping table entries, 
dynamic mapping changes, volatility of mapping table 
entries, and data sharing among similar virtual disks 
150. 

[0026] When activated, the invalid state 240 generally 
indicates that the mapping table entry 210 contains no 
useable mapping information and cannot support I/O 
operations. Any attempt to implement an I/O operation 
through this table entry 21 0 causes the mapping agent 
110 to send a fault message to the controller 120. The 
agent 110 does not proceed with the I/O operation until 
the controller 120 returns a fault response. In one em- 
bodiment, the system 100 initially activates the invalid 
state 240 for all entries 210 in the table 200 when the 



table 200 is newly created. In this way, the table 200 
ignores any residual entries in memory from prior stored 
tables to insure that current entries are active and reli- 
able. Similarly, the invalid state 240 may be activated 
when entry 210 is "forgotten" and lost by the mapping 
agent 1 1 0 volatile memory. If the invalid state 240 is ac- 
tivated in the entry 21 0, then all other values and states 
in the entry 210 are assumed to contain no valid infor- 
mation and are ignored. 

[0027] Because the tables 200 located in the mapping 
agents 110 are volatile, any failure or restart of the map- 
ping agents 1 1 0 causes all of the entries 21 0 to have an 
active invalid state 240. A sustained loss of communi- 
cation between the controller 120 and mapping agent 
1 1 0 also causes I/O operations to stop, either by making 
all mapping table entries revert to an active invalid state 
240 or by adding additional mechanisms to suspend I/ 
O operations until directed by the controller 120 to 
resume I/O operations. This configuration allows the 
controller 120 to continue coordinating other mapping 
agents 1 1 0 by knowing that a failed or unreachable map- 
ping agent 1 1 0 has been placed into a known state, pro- 
viding the controller 1 20 high availability of data access 
to the surviving mapping agents 110. 
[0028] As presented above, the Nw state 250, when 
activated, indicates that any write operations to the vir- 
tual disk segment(s) 220 represented by the entry 21 0 
cause the agent 11 0 to send a fault message the con- 
troller 120. The agent 110 does not allow the host 140 
to write to the storage locations 230 until the controller 
120 returns a fault response to deactivate the Nw state 
250 or until the system 1 00 otherwise takes some action 
to write to a segment despite the active Nw state 250. 
Unlike the invalid state 240, the activated Nw state 250 
prevents operations from generating faults. Instead, the 
agent 110 generally allows the host 140 to proceed to 
access data at the storage location 230. Accordingly, if 
only the Nw state is activated, table entry 21 0 must con- 
tain a useable storage location 230. 
[0029] In another embodiment, the table 200 further 
includes a zero (Z) state 260. When active, the Z state 
260 indicates that the virtual disk segment 220 repre- 
sented by the entry 21 0 contains all zero bytes. This fea- 
ture allows a virtual disk 150 to be created and appear 
to be initialized without the need to allocate or adjust 
any underlying non-virtual storage. If an entry 21 0 con- 
tains an active Z state 260, the agent 110 ignores the 
storage address 230. If the host 140 attempts to read 
information stored at storage address 230, the agent 
110 returns only zero-filled blocks regardless of the ac- 
tual contents of the storage address 230. On the other 
hand, any attempts to write data at the storage address 
230 when Z state 260 is active cause the agent 110 to 
send a fault message to the controller 120. The agent 
110 does not allow the host 140 to write to the storage 
locations 230 until the controller 120 returns a fault re- 
sponse that deactivates the Z state 260 or until the sys- 
tem 1 00 otherwise takes some action to write to a seg- 
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ment despite the active Z state 260. 
[0030] in another configuration, the mapping table 
200 further includes an error (E) state 270. When active, 
the E state 270 indicates the existence of an error con- 
dition and provides the information necessary to instruct 
the agent to return an error without disrupting any pre- 
vious state. The E state 270 is used where a pre-existing 
failure is known and such failure would cause any at- 
tempts at I/O access to fail. It should be noted, however, 
that the E state 270 could also be used as the means to 
issue an error status from a mapping fault. If an entry 
210 contains an active E state 270, the agent 110 ig- 
nores the storage address 230. If the host 1 40 attempts 
to read from or write to the storage address 230, the 
agent 110 returns an error to the host 140. 
[0031] The interaction of the agent 110 and the con- 
troller 120 is now described in greater detail. In one cat- 
egory of interactions, fault/response operations, the 
agent 110 sends a message to the controller 120 indi- 
cating the occurrence of a fault during an I/O operation 
to the table 200. Typically, the fault occurs as a result of 
an activated state (as described above) that prevents 
the execution of the I/O operation by the agent. The 
agent 110 sends the fault message is to the controller 
120. The controller then determines an appropriate ac- 
tion and commands the agent 110 accordingly. 
[0032] In one type of fault/response operation, a map 
fault, the mapping agent 110 alerts the controller 120 
that an I/O operation requested by the host 140 cannot 
be completed because the mapping table entry 21 0 has 
an activated state that prevents the completion of the 
requested I/O operation. For example, the mapping 
agent 110 produces a fault message to the controller 
120 in response to a request for any I/O operation to a 
table entry 210 having an active invalid flag 240 or an 
attempt to write to storage address 230 having an active 
corresponding Nw flag 250. The map fault message 
from the agent 110 generally identifies the requested I/ 
O operation, the virtual disk segment 220 involved, and 
the table state preventing the I/O operation. When the 
fault occurs, the agent does not attempt to carry out the 
I/O operation. Instead, the controller 120 uses the fault 
message to respond to the faulted I/O operation (e.g. 
load map entry, change map entry, delay until some oth- 
er operation has completed). The controller 120 re- 
sponse informs the mapping agent 110 how to proceed 
to overcome the cause for the fault. 
[0033] The controller 120 generally instructs the 
agent 110 either to resolve the problem or to send an 
error message to the requesting host. When resolving 
the problem, the controller 1 20 sends a replacement ta- 
ble entry 210. The agent 1 1 0 then inserts the new table 
entry 21 0 in the table (in place of the former faulty entry) 
and then retries I/O operation. If the controller 120 can- 
not resolve the problem, it instructs the agent 1 10 to is- 
sue an error message to the host. To cause the agent 
110 to issue an error message, the controller instructs 
the agent to activate the error state 260 for the table 



entry 210 causing the fault. As described above, the 
agent 1 1 0 then issues an error message to the host 1 40 
regardless of the other contents of the table entry 210. 
[0034] Commands to the agent 110 initiated by the 

5 controller 120 comprise a second category of interac- 
tions, command/response operations. Among these 
commands initiated by the controller 120 include the 
creation of a new mapping table 200 with all entries set 
to have an active invalid flag or the deletion of an existing 

10 table 200. The controller 1 20 may obtain from the agent 
1 1 0 the contents of one of the entries 21 0 or the status 
of the one of the states in this entry 21 0. The controller 
1 20 can further order the agent 1 1 0 to set all of the con- 
tents for one of the entries 210 or the status of one of 

15 the states for the entry 210. It should be noted that once 
the invalid state 240, the error state 260, and the zero 
state 270 are active, the controller 120 cannot deacti- 
vate the state because , as described above, initial acti- 
vation of these states voids the storage address 230. To 

20 deactivate these states, the controller 1 20 must instruct 
the agent 11 0 to replace the existing entry 21 0 with an 
entirely new entry. To each of these commands, the 
agent 1 1 0 returns a response to the controller 1 20 after 
completing the ordered task. 

25 [0035] When the controller 1 20 instructs the agent to 
either set or obtain information from the table 200, the 
system optimally allows the controller 120 to specify 
multiple, contiguous map table entries 210 in a single 
command. This arrangement allows the agent 110 and 

30 the controller 120 to interact more efficiently and with 
fewer instructions. 

[0036] When the controller 1 20 commands the agent 
1 1 0 to set one or all of the values and states in the table 
entry 21 0, the controller 1 20 command to the agent 1 1 0 

35 optimally includes a blocking flag. The blocking flag in- 
structs the agent 1 1 0 to delay responding to the control- 
ler 120 command until after the completion of any I/O 
operations initiated before the command. The blocking 
flag applies to only that command in which it resides. 

40 other concurrent commands and subsequent com- 
mands are not generally affected by the blocking flag of 
one command. In particular, the agent 110 immediately 
changes the table 200, as instructed in the command, 
but does not notify the controller 1 20 of this change until 

45 completing all previously existing I/O operations. In this 
way, the agent 1 1 0 signals to the controller 1 20 the com- 
pletion of all I/O operations using the former table 200 
that do not reflect the changes to the table specified in 
the command. 

so [0037] During a majority of the operation , the mapping 
agent 110 operates without faults. In non-fault cases, i. 
e. the mapping table entries 210 are valid and do not 
have any active states that prevent the requested I/O 
operation, the virtual disk I/O operates entirely through 

55 the mapping agent 110. Thus, all I/O operations proceed 
through the mapping table 200 and directly to the phys- 
ical storage containers 160 without any involvement of 
the controller 120. As a result, the controller 120 inserts 
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itself into an I/O stream only when needed to perform 
various management operations and typically does not 
become involved in nonfaulting cases, allowing the sys- 
tem 100 to have high performance and scalability. 
[0038] The foregoing description of the preferred em- 
bodiments of the invention has been presented for the 
purposes of illustration and description. It is not intended 
to be exhaustive or to limit the invention to the precise 
form disclosed. Many modifications and variations are 
possible in light of the above teaching. It is intended that 
the scope of the invention be limited not by this detailed 
description, but rather by the claims appended hereto. 
The above specification, examples and data provide a 
complete description of the manufacture and use of the 
composition of the invention. Since many embodiments 
of the invention can be made without departing from the 
spirit and scope of the invention, the invention resides 
in the claims hereinafter appended. 



Claims 

1 . A virtual storage system for linking a host to one or 
more storage devices over a network, the system 
comprising; 

an agent connected to the host, the agent hav- 
ing volatile memory for storing a first copy of a 
table, the table having entries to map virtual 
disk positions to locations on the storage devic- 
es; and 

a controller coupled to the agent, the controller 
having non-volatile memory for storing a sec- 
ond copy of the table, the controller intermit- 
tently causing contents of the first copy of the 
table to be replaced by contents of the second 
copy of the table, 

whereby during an input/output (I/O) operation, 
the host accesses one of the entries in the table 
stored on the agent to determine one of the 
storage device locations. 

2. The system of claim 1 , wherein the table entries fur- 
ther include an indication of whether an invalid state 
is activated such that the invalid state for a table 
entry becomes activated when that table entry con- 
tains no useable mapping information. 

3. The system of claim 2, wherein the agent does not 
allow the host to complete the I/O operations with 
one of the entries if the invalid state for that entry is 
activated. 

4. The system of claim 1 , wherein the table entries fur- 
ther include an indication of whether a no-write state 
is activated such that the no-write state for one of 
the entries becomes activated when data cannot be 
written to the storage location contained in that en- 



try. 

5. The system of claim 4, wherein the agent does not 
allow the host to write data to the storage location 

5 in one of the entries if the no-write state for that entry 

is activated. 

6. The system of claim 1 , further comprising a com- 
munication channel to couple the agent and the 

10 controller. 

7. The system of claim 6, wherein the communication 
channel employs a data transfer protocol to trans- 
port messages on the communication channel. 

15 

8. The system of claim 1 , wherein the entries include 
an offset. 

9. The system of claim 8, wherein the offset includes 
20 logic unit number identifier. 

10. The system of claim 8, wherein the offset includes 
a block identifier. 

25 11. The system of claim 1 0, wherein the entries further 
includes a segment of virtual disk positions. 

12. A system for mapping a virtual disk segment to a 
storage location within a storage device, such that 

30 a host issues a I/O operation to an agent and the 
agent determines said storage location for input/ 
output operations, said system comprising: 

a table having an entry corresponding to said 
35 storage location; 

a plurality of variables indicating states of the 
entry; 

an offset for the entry, wherein the offset in- 
cludes a logic unit number identifier and a block 
40 identifier; and 

a memory to store the table. 

1 3. The system of claim 1 2, wherein the memory is vol- 
atile. 

45 

14. The system of claim 12, wherein said storage loca- 
tion comprises a block of data within the storage de- 
vice. 

50 15. The system of claim 14, wherein the block of data 
is about 1 MB. 

16. The system of claim 12, wherein the agent is cou- 
pled to the host. 

55 

1 7. The system of claim 1 2, wherein the plurality of var- 
iables comprise Boolean variable. 



15 

8. 



40 



7 



13 



EP1 178 407 A2 



14 



18. The system of claim 12, wherein the states include 
an invalid state. 

1 9. The system of claim 1 8, wherein the plurality of var- 
iables includes a variable for the invalid state. 

20. The system of claim 12, wherein the states include 
a no -write state. 

21 . The system of claim 20, wherein the plurality of var- 
iables includes a variable for the no-write state. 

22. The system of claim 12, wherein the states include 
a zero state. 

23. The system of claim 12, wherein the states include 
an error state. 

2*4. i-i ~ihod for performing an operation on a virtual 
disk coupled to a host within a network, comprising: 

specifying a block on the virtual disk within the 
operation; 

accessing a table mapping the block to a stor- 
age location on a storage device; 
issuing a corresponding operation to the stor- 
age device, wherein the corresponding opera- 
tion correlates to the operation on the virtual 
disk; 

completing the corresponding operation; and 
presenting the completed corresponding oper- 
ation to the virtual disk. 

25. The method of claim 24, wherein the issuing step 
includes issuing the corresponding operation from 
an agent coupled to the host. 

26. The method of claim 24, further comprising updat- 
ing the table with a persistently-stored table residing 
in a non-volatile memory. 

27. The method of claim 24, further comprising deter- 
mining states of the table. 

28. The method of claim 24, further comprising sending 
a fault message when the table is unable to be ac- 
cessed. 

29. The method of claim 24, further comprising storing 
the table in a volatile memory. 

30. The method of claim 24, receiving updates for the 
table from a controller. 

31 . A method for maintaining a table for mapping virtual 
disk blocks to storage locations on storage devices 
within a network, comprising: 



receiving a command from a controller at an 
agent storing the table; 
activating states within entries of the table; 
completing operations at the table; and 
5 updating the table in response to the command. 

32. The method of claim 31 , further comprising setting 
a blocking flag until operations are completed. 

10 33. The method of claim 31 , further comprising obtain 
ing mapping information from one 0 f the entn>- 
the table. 

34. A computer prop^- «uuct comprising a compu- 
15 ter usep^' ,^ciium having computer readable 

c.r^' embodied therein for performing an operation 
, a virtual disk coupled to a host within a network, 
the computer program product adapted when run 
on a computer to effect steps including: 

20 

specifying a block on the virtual disk within the 
operation; 

accessing a table mapping the block to a stor- 
age location on a storage device; 
25 issuing a corresponding operation to the stor- 

age device, wherein the corresponding opera- 
tion correlates to the operation on the virtual 
disk; 

completing the corresponding operation; and 
30 presenting the completed corresponding oper- 

ation to the virtual disk. 

35. A computer program product comprising a compu- 
ter useable medium having computer readable 

35 code embodied therein for maintaining a table for 
mapping virtual disk blocks to storage locations on 
storage devices within a network, the computer pro- 
gram product adapted when run on a computer to 
effect steps including: 

40 

receiving a command from a controller at an 
agent storing the table; 
activating states within entries of the table; 
completing operations at the table; and 
45 updating the table in response to the command. 
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